1 research outputs found

    Arabic Typed Text Recognition in Graphics Images (ATTR-GI)

    Get PDF
    While optical character recognition (OCR) techniques may perform well on standard text documents, their performance degrades significantly in graphics images. In standard scanned text documents OCR techniques enjoy a number of convenient assumptions such as clear backgrounds, standard fonts, predefined line orientation, page size, the start point of written. These assumptions are not true in graphics documents such as Arabic advertisements, personal cards, screenshot. Therefore, in such types of images, greater attention is required in the initial stage of detecting Arabic text regions in order for subsequent character recognition steps to be successful. Special features of Arabic alphabet characters introduce additional challenges which are not present in Latin alphabet characters. In this research we propose a new technique for automatically detecting text in graphics documents, and preparing them for OCR processing. Our detection approach is based on some mathematical measurements to know is it a text or not and to know is it Arabic Based Text or Latin Based. These measurements are follows, measure the Base Line (the line has maximum number of black pixels). Also, measure Item Area (the content of extracted sub images). Finally, find maximum peak for the adjacent black pixels in Base line and maximum length for sub adjacent black pixels. Our experiment results will come in more details. We believe our technique will enable OCR systems to overcome their major shortcoming when dealing with text in graphics images. This will further enable a variety of OCR-based applications to extend their operation to graphics documents such as SPAM detection from image, reading advertisement for blind people, search and index document which contain image, enhancing for printer property (black white or color printer) and enhancing OCR
    corecore